Overview

For this assignment I have chosen to utilise data available through ‘Inside Airbnb’, to aid individuals who are looking to invest in a property in New York with the view of renting it out through the Airbnb platform. The information in this report would aid in answering questions such as:

  • “How many listings are in my neighbourhood and where are they?”
  • “How much are hosts making from renting to tourists (compare that to long-term rentals)?”
  • “How many houses and apartments are being rented out frequently to tourists and not to long-term residents?”

Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world. By utilsing this data the target audience of this report would be able to utilise key metrics to see how Airbnb is being used to compete with the residential housing market.

Inside Airbnb data is publicly available at Inside Airbnb where comprehensive data on the Airbnb market is available for most cities globally. The figure below provides an overview of the data available in the listings data set for New York city

# Load NY listings csv file
airbnb_description <- read_csv("description.csv", col_names = TRUE)

# Convert to DF
df_airbnb_description <- data.frame(airbnb_description)

df_airbnb_description %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Field Description
id Unique ID per listing
name Name and description of listed room
host_id Unique host ID
host_name Host name
neighbourhood_group New York neighborhood grouping name (Bronx, Brooklyn, Manhattan, Queens, Staten Island)
neighbourhood New York sub neighbourhood name
latitude Latitude co-ordinates
longitude Longitude co-ordinates
room_type Room type (Private room, Entire room/apt, Shared room, Hotel room)
price Price per night in US$
minimum_nights Minimum number of nights to book
number_of_reviews Number of reviews for listing
last_review Date of last review
reviews_per_month Average number of reviews received per month
calculated_host_listings_count Number of listings host of property has in New York
availability_365 Number of days available out of 365

Data Preperation

# Load airbnb description csv file
ny_airbnb_listings <- read_csv("newyork_airbnb_listings.csv", col_names = TRUE)
# Take a look at the data
str(ny_airbnb_listings)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 48377 obs. of  16 variables:
##  $ id                            : num  3647 3831 5022 5099 5121 ...
##  $ name                          : chr  "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" "Entire Apt: Spacious Studio/Loft by central park" "Large Cozy 1 BR Apartment In Midtown East" ...
##  $ host_id                       : num  4632 4869 7192 7322 7356 ...
##  $ host_name                     : chr  "Elisabeth" "LisaRoxanne" "Laura" "Chris" ...
##  $ neighbourhood_group           : chr  "Manhattan" "Brooklyn" "Manhattan" "Manhattan" ...
##  $ neighbourhood                 : chr  "Harlem" "Clinton Hill" "East Harlem" "Murray Hill" ...
##  $ latitude                      : num  40.8 40.7 40.8 40.7 40.7 ...
##  $ longitude                     : num  -73.9 -74 -73.9 -74 -74 ...
##  $ room_type                     : chr  "Private room" "Entire home/apt" "Entire home/apt" "Entire home/apt" ...
##  $ price                         : num  150 89 80 200 60 79 79 116 150 135 ...
##  $ minimum_nights                : num  3 1 10 3 45 2 2 30 1 5 ...
##  $ number_of_reviews             : num  0 279 9 75 49 443 118 94 161 54 ...
##  $ last_review                   : Date, format: NA "2019-08-29" ...
##  $ reviews_per_month             : num  NA 4.62 0.1 0.59 0.39 3.51 0.97 0.73 1.32 0.43 ...
##  $ calculated_host_listings_count: num  1 1 1 1 1 1 1 1 4 1 ...
##  $ availability_365              : num  365 192 0 13 0 246 0 347 0 40 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_double(),
##   ..   name = col_character(),
##   ..   host_id = col_double(),
##   ..   host_name = col_character(),
##   ..   neighbourhood_group = col_character(),
##   ..   neighbourhood = col_character(),
##   ..   latitude = col_double(),
##   ..   longitude = col_double(),
##   ..   room_type = col_character(),
##   ..   price = col_double(),
##   ..   minimum_nights = col_double(),
##   ..   number_of_reviews = col_double(),
##   ..   last_review = col_date(format = ""),
##   ..   reviews_per_month = col_double(),
##   ..   calculated_host_listings_count = col_double(),
##   ..   availability_365 = col_double()
##   .. )
summary(ny_airbnb_listings)
##        id               name              host_id         
##  Min.   :    3647   Length:48377       Min.   :     2438  
##  1st Qu.: 9699559   Class :character   1st Qu.:  8288419  
##  Median :20322645   Mode  :character   Median : 33067672  
##  Mean   :19893435                      Mean   : 72458150  
##  3rd Qu.:30343546                      3rd Qu.:117088883  
##  Max.   :38568081                      Max.   :294184975  
##                                                           
##   host_name         neighbourhood_group neighbourhood         latitude    
##  Length:48377       Length:48377        Length:48377       Min.   :40.50  
##  Class :character   Class :character    Class :character   1st Qu.:40.69  
##  Mode  :character   Mode  :character    Mode  :character   Median :40.72  
##                                                            Mean   :40.73  
##                                                            3rd Qu.:40.76  
##                                                            Max.   :40.92  
##                                                                           
##    longitude       room_type             price         minimum_nights    
##  Min.   :-74.24   Length:48377       Min.   :    0.0   Min.   :   1.000  
##  1st Qu.:-73.98   Class :character   1st Qu.:   69.0   1st Qu.:   1.000  
##  Median :-73.96   Mode  :character   Median :  105.0   Median :   2.000  
##  Mean   :-73.95                      Mean   :  152.7   Mean   :   7.425  
##  3rd Qu.:-73.93                      3rd Qu.:  175.0   3rd Qu.:   5.000  
##  Max.   :-73.71                      Max.   :10000.0   Max.   :1250.000  
##                                                                          
##  number_of_reviews  last_review         reviews_per_month
##  Min.   :  0.00    Min.   :2011-03-28   Min.   : 0.010   
##  1st Qu.:  1.00    1st Qu.:2018-08-24   1st Qu.: 0.190   
##  Median :  5.00    Median :2019-07-25   Median : 0.730   
##  Mean   : 24.12    Mean   :2018-11-30   Mean   : 1.385   
##  3rd Qu.: 25.00    3rd Qu.:2019-08-29   3rd Qu.: 2.040   
##  Max.   :654.00    Max.   :2019-09-12   Max.   :67.600   
##                    NA's   :9651         NA's   :9651     
##  calculated_host_listings_count availability_365
##  Min.   :  1.000                Min.   :  0.0   
##  1st Qu.:  1.000                1st Qu.:  0.0   
##  Median :  1.000                Median : 47.0   
##  Mean   :  8.153                Mean   :114.1   
##  3rd Qu.:  2.000                3rd Qu.:252.0   
##  Max.   :387.000                Max.   :365.0   
## 
head(ny_airbnb_listings)
## # A tibble: 6 x 16
##      id name  host_id host_name neighbourhood_g~ neighbourhood latitude
##   <dbl> <chr>   <dbl> <chr>     <chr>            <chr>            <dbl>
## 1  3647 THE ~    4632 Elisabeth Manhattan        Harlem            40.8
## 2  3831 Cozy~    4869 LisaRoxa~ Brooklyn         Clinton Hill      40.7
## 3  5022 Enti~    7192 Laura     Manhattan        East Harlem       40.8
## 4  5099 Larg~    7322 Chris     Manhattan        Murray Hill       40.7
## 5  5121 Blis~    7356 Garon     Brooklyn         Bedford-Stuy~     40.7
## 6  5178 Larg~    8967 Shunichi  Manhattan        Hell's Kitch~     40.8
## # ... with 9 more variables: longitude <dbl>, room_type <chr>,
## #   price <dbl>, minimum_nights <dbl>, number_of_reviews <dbl>,
## #   last_review <date>, reviews_per_month <dbl>,
## #   calculated_host_listings_count <dbl>, availability_365 <dbl>

Chart Characteristics Template

# Load windows font calibra
windowsFonts("Calibra" = windowsFont("Calibra"))

# Create RC chart attributes
rc_chartattributes1 <- theme_bw() +
                        theme(text=element_text(family="Calibra")) +
                        theme(panel.border = element_blank(),
                              panel.grid.major = element_blank(),
                              panel.grid.minor = element_blank(),
                              axis.line = element_line(colour = "gray"),
                              axis.ticks.x = element_blank(),
                              axis.ticks.y = element_blank(),
                              plot.title = element_text(color = "black", size = 28, face = "bold"),
                              plot.subtitle = element_text(color = "gray45", size = 16),
                              plot.caption = element_text(color = "gray45", size = 12, face = "italic", hjust = 0),
                              legend.position="bottom")

Chart 1: Rooms by Type by Neighbourhood

# Group data by neighbourhood group and room type 
ny_airbnb_listings_room_number <- ny_airbnb_listings %>%
    group_by(neighbourhood_group, room_type) %>%
    tally

# Number of rooms by type by neighbourhood
bar_chart_nh_room_type <- ggplot(data = ny_airbnb_listings_room_number) +
                                  geom_bar(aes(x = neighbourhood_group, y = n, group = room_type, fill = room_type), stat="identity", alpha = 1) + 
                                  labs(title = "New York Airbnb room listings by neighbourhood", 
                                      subtitle = "Manhatten has the most rooms listed with c.21,000, with a majority being 'Entire rooms/apts'", 
                                      caption = "Source: http://insideairbnb.com/get-the-data.html",
                                      x = "Neighbourhood group", 
                                      y = "Number of rooms",
                                      fill = "Room type") + 
                                  scale_y_continuous(labels = comma) +
                                  scale_color_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
                                  scale_fill_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
                                  rc_chartattributes1
               
bar_chart_nh_room_type

Explanation:

  • The chart presented above shows all Airbnb listings in New York by room type and Neighbourhood group
  • There are 5 Neighbourhood groups in New York, being the Bronx, Brooklyn, Manhatten, Queens and Staten Island
  • There are 4 room type classifications in New York, being ‘Entire home/apt’, ‘Hotel room’, ‘Private room’, and ‘Shared room’

Key Insights:

  • Manhatten has the most listing in New York with c. 21,183, followed by Brooklyn with 19,856
  • Staten Island has the fewest listings with only 359
  • In Manhatten, the most popular room listing type is ‘Entire home/apt’ with 12,828, followed by ‘Private room’ with 7,559
  • Across all of New York there are few listings for either ‘Hotel rooms’ or ‘Shared rooms’
  • Whilst demand is not known, there may be an opportunity in providing a room listing that is a ‘Shared room’ to cater to what may an under served market
  • In addition, Bronx and Queens, which have much fewer listings than both Brooklyn and Manhatten, may represent good opportunities given there proximity to central New York

Chart 2: Room Price Distribution by Type by Neighbourhood

# Average room price by group_neighbourhood
ny_airbnb_listings_nh_mean <- ny_airbnb_listings %>%
    group_by(neighbourhood_group) %>%
    summarise(price = round(mean(price), 2))

# Density plot of room price by type by neighbourhood
density_price_nh <- ggplot(data = ny_airbnb_listings) +
                            geom_density(aes(x = price, color = room_type, fill = room_type), position = "identity", bins = 40, alpha = 0.3) +
                            labs(title = "Distribution of New York neighbourhood prices by room type", 
                                        subtitle = "Manhattan exhibits the highest average price, driven by having a greater mix of 'entire room/apt' type of rooms", 
                                        caption = "Source: http://insideairbnb.com/get-the-data.html",
                                        x = "Price (Log10 transformation)", 
                                        y = "Density",
                                        color = "Type of room",
                                        fill = "Type of room") + 
                            scale_color_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
                            scale_fill_manual(values = c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B")) +
                            scale_x_log10() +
                            geom_vline(data = ny_airbnb_listings_nh_mean, aes(xintercept = price), linetype="dashed", color = "gray45") +
                            geom_text(data = ny_airbnb_listings_nh_mean,y = 3, aes(x = price + 1400, label = paste("Mean  = ",price)), color = "gray45", size = 4) +
                            facet_wrap(~neighbourhood_group, nrow=1) +
                            rc_chartattributes1

density_price_nh

Explanation:

  • The chart presented above shows the distribution of Airbnb listing prices in New York by room type and Neighbourhood group
  • There are 5 Neighbourhood groups in New York, being the Bronx, Brooklyn, Manhatten, Queens and Staten Island
  • There are 4 room type classifications in New York, being ‘Entire home/apt’, ‘Hotel room’, ‘Private room’, and ‘Shared room’
  • The x-axis has been transformed to a log10 given there are a few outliers which command a very high price per night
  • The density distributions per room type/neighbourhood show the spread in prices for that category
  • In addition, the mean price for the Neighbourhood group across all room types is shown

Key Insights:

  • Manhattan has the highest average price per room at $199.5, followed by Brooklyn at $122.9
  • The Bronx is the cheapest with an average of only $85.8 per night
  • All Neighbourhoods appear to have a number of outliers at the $1,000 + price range
  • The Manhattan price average is driven by a a number of key factors. The price of entire home/apts appears to be expensive, but there is a spike in Hotel room costs aroudn the $1,000 a night mark
  • Investors will need to be concious of much lower room incomes if identifying opportunities outside of Manhattan and Brooklyn

Chart 3: Map of Rooms Across New York

# Create room type palette
room_type_color <- colorFactor(c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B"), domain=c("Entire home/apt", "Hotel room", "Private room", "Shared room"))

# Create new price column to show relative sizes in chart
ny_airbnb_listings$price_scaled <- 0.001*(ny_airbnb_listings$price)

# Create map output 
newyork_map <- ny_airbnb_listings %>%
                leaflet(width = "100%") %>%
                      addProviderTiles(providers$Stamen.TonerBackground) %>% 
                      setView(-73.96, 40.72, zoom = 11) %>% 
                      addCircleMarkers(~longitude, ~latitude, 
                                       popup=paste("Name:", ny_airbnb_listings$name, "<br>",
                                                   "Type:", ny_airbnb_listings$room_type, "<br>",
                                                   "Price:",ny_airbnb_listings$price), 
                                       weight = 1, radius= ~price_scaled, 
                      color=~room_type_color(room_type), stroke = F, fillOpacity = 0.4) %>%
                            addLegend("bottomright", colors= c("#173F5F", "#3CAEA3", "#F6D55C", "#ED553B"), labels=c("Entire home/apt", "Hotel room", "Private room", "Shared room"), title="Room types")

newyork_map

Explanation:

  • The map above shows all rooms across New York plotted using their lat/long co-ordinates
  • The rooms are coloured based on room type, being either ‘Entire home/apt’, ‘Hotel room’, ‘Private room’, or ‘Shared room’
  • The size of the bubble represents the price per night
  • Labels have been added to each room allowing the user to see exactly the room type and listing price when seelcting a room to accurately see what else is available in an area they are considering investing in

Key Insights:

  • Manhattan has the highest density of Entire home/apt rooms, whilst the map show a much higher skew towards ‘Private rooms’ in areas such as Brooklyn
  • The density of rooms available also becomes apparent; given the smaller size of Manhattan island there would be much more competition when looking to provide a Airbnb service, compared to parts of Queens and Brooklyn which would appear to be under represented